{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YjOR1n15wn1K"
      },
      "source": [
        "# qsv Quickstart on Google Colab\n",
        "\n",
        "<a target=\"_blank\" href=\"https://colab.research.google.com/github/jqnatividad/qsv/blob/master/contrib/notebooks/qsv-colab-quickstart.ipynb\">\n",
        "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\"/>\n",
        "</a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9CFiuF_abuL9"
      },
      "source": [
        "Get up and running with using [qsv](https://github.com/jqnatividad/qsv) on [Google Colab](https://colab.google)!\n",
        "\n",
        "Simply [open this notebook in Google Colab](https://colab.research.google.com/github/jqnatividad/qsv/blob/master/contrib/notebooks/qsv-colab-quickstart.ipynb), sign in to your Google account, and **follow Parts 1 & 2 below**."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3vsTulc0OGqi"
      },
      "source": [
        "## Table of Contents\n",
        "\n",
        "1. [Setup](#1)\n",
        "  - 1.1 [Environment Notes](#1.1)\n",
        "  - 1.2 [Downloading qsv](#1.2)\n",
        "  - 1.3 [Resources](#1.3)\n",
        "2. [Common Tasks](#2)\n",
        "  - 2.1 [Viewing Commands & Their Help Messages](#2.1)\n",
        "  - 2.2 [Adding Files](#2.2)\n",
        "3. [More Resources](#3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iUJGEpSUMA7R"
      },
      "source": [
        "<a id=\"1\" name=\"1\"></a>\n",
        "## Part 1: Setup"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "v05J1AsdXAgT"
      },
      "source": [
        "<a id=\"1.1\" name=\"1.1\"></a>\n",
        "### 1.1 Environment Notes\n",
        "\n",
        " - The notebook was run on Google Colab based on an Ubuntu 22.04 LTS environment, so you may need to modify the commands if you're running locally, on a different OS (i.e. Windows), or are missing any dependencies.\n",
        " - You'll need to prepend qsv commands by an exclamation point `!` in this Google Colab environment to execute them. This may not be necessary when using qsv on a terminal."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jThnX2bkBvZj"
      },
      "source": [
        "<a id=\"1.2\" name=\"1.2\"></a>\n",
        "### 1.2 Downloading qsv\n",
        "\n",
        "First, let's download qsv into our notebook from the [releases page](https://github.com/jqnatividad/qsv/releases). We'll use qsv 0.112.0:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "5E4Jy22ozjM8",
        "outputId": "f84fc371-7774-48d3-9077-3e3d147a720e"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
            "                                 Dload  Upload   Total   Spent    Left  Speed\n",
            "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\n",
            "100 73.4M  100 73.4M    0     0  32.0M      0  0:00:02  0:00:02 --:--:-- 45.1M\n",
            "Archive:  qsv-0.112.0-x86_64-unknown-linux-gnu.zip\n",
            "  inflating: qsv-0.112.0-files/README  \n",
            "  inflating: qsv-0.112.0-files/qsv   \n",
            "  inflating: qsv-0.112.0-files/qsv_glibc-2.31  \n",
            "  inflating: qsv-0.112.0-files/qsv_glibc-2.31_rust_version_info.txt  \n",
            "  inflating: qsv-0.112.0-files/qsv_nightly  \n",
            "  inflating: qsv-0.112.0-files/qsv_nightly_rust_version_info.txt  \n",
            "  inflating: qsv-0.112.0-files/qsvdp  \n",
            "  inflating: qsv-0.112.0-files/qsvdp_glibc-2.31  \n",
            "  inflating: qsv-0.112.0-files/qsvdp_nightly  \n",
            "  inflating: qsv-0.112.0-files/qsvlite  \n",
            "  inflating: qsv-0.112.0-files/qsvlite_glibc-2.31  \n",
            "  inflating: qsv-0.112.0-files/qsvlite_nightly  \n"
          ]
        }
      ],
      "source": [
        "# Downloading the .zip file that contains qsv\n",
        "!curl -LO https://github.com/jqnatividad/qsv/releases/download/0.112.0/qsv-0.112.0-x86_64-unknown-linux-gnu.zip\n",
        "# Unzipping the .zip file into a folder\n",
        "!unzip -o qsv-0.112.0-x86_64-unknown-linux-gnu.zip -d qsv-0.112.0-files\n",
        "# Moving the qsv binary file from the folder into /bin to use the qsv command anywhere on our system\n",
        "!cp qsv-0.112.0-files/qsv /bin"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nc-nKxgbmS8Q"
      },
      "source": [
        "Great, you can now use qsv on Google Colab!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OVTySttQtRmz"
      },
      "source": [
        "<a id=\"2\" name=\"2\"></a>\n",
        "## Part 2: Common Tasks"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aFLFh8HhuGWE"
      },
      "source": [
        "<a id=\"2.1\" name=\"2.1\"></a>\n",
        "## 2.1 Viewing Commands & Their Help Messages"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7VcW5CHxtQTm"
      },
      "source": [
        "You may view the available commands for qsv with the variant/version you are using by simply running qsv:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "p249ilsYmfev",
        "outputId": "92738e36-8ab7-416b-bf44-7c4cfffd49c8"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "qsv is a suite of CSV command line utilities.\n",
            "\n",
            "Please choose one of the following 53 commands:\n",
            "    apply       Apply series of transformations to a column\n",
            "    behead      Drop header from CSV file\n",
            "    cat         Concatenate by row or column\n",
            "    count       Count records\n",
            "    dedup       Remove redundant rows\n",
            "    describegpt Infer extended metadata using a LLM\n",
            "    diff        Find the difference between two CSVs\n",
            "    enum        Add a new column enumerating CSV lines\n",
            "    excel       Exports an Excel sheet to a CSV\n",
            "    exclude     Excludes the records in one CSV from another\n",
            "    explode     Explode rows based on some column separator\n",
            "    extdedup    Remove duplicates rows from an arbitrarily large text file\n",
            "    extsort     Sort arbitrarily large text file\n",
            "    fetch       Fetches data from web services for every row using HTTP Get.\n",
            "    fetchpost   Fetches data from web services for every row using HTTP Post.\n",
            "    fill        Fill empty values\n",
            "    fixlengths  Makes all records have same length\n",
            "    flatten     Show one field per line\n",
            "    fmt         Format CSV output (change field delimiter)\n",
            "    foreach     Loop over a CSV file to execute bash commands (*nix only)\n",
            "    frequency   Show frequency tables\n",
            "    generate    Generate test data by profiling a CSV\n",
            "    headers     Show header names\n",
            "    help        Show this usage message\n",
            "    index       Create CSV index for faster access\n",
            "    input       Read CSVs w/ special quoting, skipping, trimming & transcoding rules\n",
            "    join        Join CSV files\n",
            "    joinp       Join CSV files using the Pola.rs engine\n",
            "    jsonl       Convert newline-delimited JSON files to CSV\n",
            "    luau        Execute Luau script on CSV data\n",
            "    partition   Partition CSV data based on a column value\n",
            "    pseudo      Pseudonymise the values of a column\n",
            "    rename      Rename the columns of CSV data efficiently\n",
            "    replace     Replace patterns in CSV data\n",
            "    reverse     Reverse rows of CSV data\n",
            "    safenames   Modify a CSV's header names to db-safe names\n",
            "    sample      Randomly sample CSV data\n",
            "    schema      Generate JSON Schema from CSV data\n",
            "    search      Search CSV data with a regex\n",
            "    searchset   Search CSV data with a regex set\n",
            "    select      Select, re-order, duplicate or drop columns\n",
            "    slice       Slice records from CSV\n",
            "    snappy      Compress/decompress data using the Snappy algorithm\n",
            "    sniff       Quickly sniff CSV metadata\n",
            "    sort        Sort CSV data in alphabetical, numerical, reverse or random order\n",
            "    sortcheck   Check if a CSV is sorted\n",
            "    split       Split CSV data into many files\n",
            "    sqlp        Run a SQL query against several CSVs using the Pola.rs engine\n",
            "    stats       Infer data types and compute summary statistics\n",
            "    table       Align CSV data into columns\n",
            "    tojsonl     Convert CSV to newline-delimited JSON\n",
            "    transpose   Transpose rows/columns of CSV data\n",
            "    validate    Validate CSV data for RFC4180-compliance or with JSON Schema\n",
            "\n",
            "sponsored by datHere - Data Infrastructure Engineering\n",
            "\n",
            "Checking GitHub for updates...\n",
            "Up to date (0.112.0)... no update required.\n"
          ]
        }
      ],
      "source": [
        "!qsv"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "E96iZ78Hmg1D"
      },
      "source": [
        "You can get further information about a specific command by using the `--help` option for the command. For example, let's get the help message for qsv's `slice` command."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "C6MJlC6cmz1J",
        "outputId": "b8e54622-f13e-4d3f-96d4-5b4a913989ef"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Returns the rows in the range specified (starting at 0, half-open interval).\n",
            "The range does not include headers.\n",
            "\n",
            "If the start of the range isn't specified, then the slice starts from the first\n",
            "record in the CSV data.\n",
            "\n",
            "If the end of the range isn't specified, then the slice continues to the last\n",
            "record in the CSV data.\n",
            "\n",
            "This operation can be made much faster by creating an index with 'qsv index'\n",
            "first. Namely, a slice on an index requires parsing just the rows that are\n",
            "sliced. Without an index, all rows up to the first row in the slice must be\n",
            "parsed.\n",
            "\n",
            "Usage:\n",
            "    qsv slice [options] [<input>]\n",
            "    qsv slice --help\n",
            "\n",
            "slice options:\n",
            "    -s, --start <arg>      The index of the record to slice from.\n",
            "                           If negative, starts from the last record.\n",
            "    -e, --end <arg>        The index of the record to slice to.\n",
            "    -l, --len <arg>        The length of the slice (can be used instead\n",
            "                           of --end).\n",
            "    -i, --index <arg>      Slice a single record (shortcut for -s N -l 1).\n",
            "\n",
            "Common options:\n",
            "    -h, --help             Display this message\n",
            "    -o, --output <file>    Write output to <file> instead of stdout.\n",
            "    -n, --no-headers       When set, the first row will not be interpreted\n",
            "                           as headers. Otherwise, the first row will always\n",
            "                           appear in the output as the header row.\n",
            "    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n",
            "                           Must be a single character. (default: ,)\n"
          ]
        }
      ],
      "source": [
        "!qsv slice --help"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AIXkUm0lolWf"
      },
      "source": [
        "<a id=\"2.2\" name=\"2.2\"></a>\n",
        "## 2.2 Adding Files"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "heV7G3VYm_pF"
      },
      "source": [
        "You may use the file explorer on the left to drag and drop files or upload from your Google Drive.\n",
        "\n",
        "You may also download files directly to this notebook, which may be more useful if you don't want to download very large files to your system.\n",
        "\n",
        "Here's an example of downloading a CSV file to this notebook from a link and renaming it as `data.csv`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "mZO_FS7QzLN3",
        "outputId": "15a232bf-fc9b-4775-9a6d-2fe0a146aac8"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
            "                                 Dload  Upload   Total   Spent    Left  Speed\n",
            "100 34.3M    0 34.3M    0     0  6577k      0 --:--:--  0:00:05 --:--:-- 8483k\n"
          ]
        }
      ],
      "source": [
        "# Downloading the .csv file as data.csv\n",
        "!curl https://data.wa.gov/api/views/f6w7-q2d2/rows.csv?accessType=DOWNLOAD -o data.csv"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZbEgoJnExmqu"
      },
      "source": [
        "Now you may use qsv commands on `data.csv`. For example, let's view the first 5 rows in `data.csv`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "M9MMA702xpYt",
        "outputId": "d53bf797-3bcb-4cd9-e6eb-d4655d05f5e1"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "VIN (1-10),County,City,State,Postal Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Base MSRP,Legislative District,DOL Vehicle ID,Vehicle Location,Electric Utility,2020 Census Tract\n",
            "5UXTA6C03P,King,Seattle,WA,98177,2023,BMW,X5,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,30,0,36,218985539,POINT (-122.38242499999996 47.77279000000004),CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),53033001600\n",
            "1FMCU0EZXN,Yakima,Moxee,WA,98936,2022,FORD,ESCAPE,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,38,0,15,197264322,POINT (-120.37951169999997 46.55609000000004),PACIFICORP,53077001702\n",
            "1G1FW6S03J,King,Seattle,WA,98117,2018,CHEVROLET,BOLT EV,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,238,0,36,168549727,POINT (-122.37275999999997 47.689685000000054),CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),53033003000\n",
            "5YJSA1AC0D,King,Newcastle,WA,98059,2013,TESLA,MODEL S,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,208,69900,41,244891062,POINT (-122.15733999999998 47.487175000000036),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033025005\n",
            "1FADP5CU8F,Kitsap,Bremerton,WA,98312,2015,FORD,C-MAX,Plug-in Hybrid Electric Vehicle (PHEV),Not eligible due to low battery range,19,0,26,134915000,POINT (-122.65223 47.57192),PUGET SOUND ENERGY INC,53035081100\n"
          ]
        }
      ],
      "source": [
        "!qsv slice -e 5 data.csv"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fnnBlrZgyBPj"
      },
      "source": [
        "Looks like raw CSV data, but what if we want to read it more easily?\n",
        "\n",
        "We can pipe `qsv slice`'s raw CSV output into `qsv table` for better readability."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ve9UXBJQyL4c"
      },
      "source": [
        "Let's try it out:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OPD1ykp4yK_8",
        "outputId": "1d55e320-2e33-411e-b4df-6143785a4879"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "VIN (1-10)  County  City       State  Postal Code  Model Year  Make       Model    Electric Vehicle Type                   Clean Alternative Fuel Vehicle (CAFV) Eligibility  Electric Range  Base MSRP  Legislative District  DOL Vehicle ID  Vehicle Location                                Electric Utility                               2020 Census Tract\n",
            "5UXTA6C03P  King    Seattle    WA     98177        2023        BMW        X5       Plug-in Hybrid Electric Vehicle (PHEV)  Clean Alternative Fuel Vehicle Eligible            30              0          36                    218985539       POINT (-122.38242499999996 47.77279000000004)   CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA)   53033001600\n",
            "1FMCU0EZXN  Yakima  Moxee      WA     98936        2022        FORD       ESCAPE   Plug-in Hybrid Electric Vehicle (PHEV)  Clean Alternative Fuel Vehicle Eligible            38              0          15                    197264322       POINT (-120.37951169999997 46.55609000000004)   PACIFICORP                                     53077001702\n",
            "1G1FW6S03J  King    Seattle    WA     98117        2018        CHEVROLET  BOLT EV  Battery Electric Vehicle (BEV)          Clean Alternative Fuel Vehicle Eligible            238             0          36                    168549727       POINT (-122.37275999999997 47.689685000000054)  CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA)   53033003000\n",
            "5YJSA1AC0D  King    Newcastle  WA     98059        2013        TESLA      MODEL S  Battery Electric Vehicle (BEV)          Clean Alternative Fuel Vehicle Eligible            208             69900      41                    244891062       POINT (-122.15733999999998 47.487175000000036)  PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA)  53033025005\n",
            "1FADP5CU8F  Kitsap  Bremerton  WA     98312        2015        FORD       C-MAX    Plug-in Hybrid Electric Vehicle (PHEV)  Not eligible due to low battery range              19              0          26                    134915000       POINT (-122.65223 47.57192)                     PUGET SOUND ENERGY INC                         53035081100\n"
          ]
        }
      ],
      "source": [
        "!qsv slice -e 5 data.csv | qsv table"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FoKIUBc-m44j"
      },
      "source": [
        "<a id=\"3\" name=\"3\"></a>\n",
        "## Part 3: More Resources\n",
        "\n",
        "Want to try other notebooks and share your notebook with others? [Make a pull request](https://github.com/jqnatividad/qsv/pulls) to [qsv's notebooks folder](https://github.com/jqnatividad/qsv/tree/master/contrib/notebooks)!\n",
        "\n",
        "Here are some links you may find useful as a reference:\n",
        "\n",
        "- [Source code for qsv commands on GitHub](https://github.com/jqnatividad/qsv/tree/master/src/cmd)\n",
        "- [Discussions forum on GitHub](https://github.com/jqnatividad/qsv/discussions)\n",
        "- [Report an issue](https://github.com/jqnatividad/qsv/issues)\n",
        "- [View and contribute to the wiki](https://github.com/jqnatividad/qsv/wiki)\n",
        "- [qsv on GitHub](https://github.com/jqnatividad/qsv)\n",
        "- [Welcome to Colaboratory](https://colab.research.google.com/)\n"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
